首页> 外文OA文献 >An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data.(Methodology article)(Report)
【2h】

An EM algorithm based on an internal list for estimating haplotype distributions of rare variants from pooled genotype data.(Methodology article)(Report)

机译:基于内部列表的Em算法,用于估计来自混合基因型数据的稀有变体的单倍型分布。(方法论文章)(报告)

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Background Pooling is a cost effective way to collect data for genetic association studies, particularly for rare genetic variants. It is of interest to estimate the haplotype frequencies, which contain more information than single locus statistics. By viewing the pooled genotype data as incomplete data, the expectation-maximization (EM) algorithm is the natural algorithm to use, but it is computationally intensive. A recent proposal to reduce the computational burden is to make use of database information to form a list of frequently occurring haplotypes, and to restrict the haplotypes to come from this list only in implementing the EM algorithm. There is, however, the danger of using an incorrect list, and there may not be enough database information to form a list externally in some applications. Results We investigate the possibility of creating an internal list from the data at hand. One way to form such a list is to collapse the observed total minor allele frequencies to ?zero? or ?at least one?, which is shown to have the desirable effect of amplifying the haplotype frequencies. To improve coverage, we propose ways to add and remove haplotypes from the list, and a benchmarking method to determine the frequency threshold for removing haplotypes. Simulation results show that the EM estimates based on a suitably augmented and trimmed collapsed data list (ATCDL) perform satisfactorily. In two scenarios involving 25 and 32 loci respectively, the EM-ATCDL estimates outperform the EM estimates based on other lists as well as the collapsed data maximum likelihood estimates. Conclusions The proposed augmented and trimmed CD list is a useful list for the EM algorithm to base upon in estimating the haplotype distributions of rare variants. It can handle more markers and larger pool size than existing methods, and the resulting EM-ATCDL estimates are more efficient than the EM estimates based on other lists.
机译:背景技术汇集是一种为基因关联研究(尤其是稀有遗传变异)收集数据的经济有效的方法。估计单倍型频率是令人感兴趣的,它比单基因座统计信息包含更多的信息。通过将合并的基因型数据视为不完整的数据,期望最大化(EM)算法是可以使用的自然算法,但计算量大。减少计算负担的最新提议是利用数据库信息来形成频繁出现的单倍型的列表,并限制单倍型仅在实现EM算法时才来自此列表。但是,存在使用不正确列表的危险,并且在某些应用程序中,可能没有足够的数据库信息来在外部形成列表。结果我们研究了根据现有数据创建内部列表的可能性。形成这样的列表的一种方法是将观察到的总次要等位基因频率折叠为“零”。或“至少一个”,显示出具有放大单倍型频率的理想效果。为了提高覆盖范围,我们提出了从列表中添加和删除单倍型的方法,以及确定删除单倍型的频率阈值的基准测试方法。仿真结果表明,基于适当扩展和修剪后的折叠数据列表(ATCDL)的EM估计效果令人满意。在分别涉及25个和32个基因座的两种情况下,EM-ATCDL估计优于基于其他列表以及折叠数据最大似然估计的EM估计。结论拟议的扩充和修整CD列表是EM算法有用的列表,可用于估算稀有变异的单倍型分布。与现有方法相比,它可以处理更多标记和更大的池大小,并且所得EM-ATCDL估计比基于其他列表的EM估计更有效。

著录项

  • 作者

    Li, X; Kuk, AYC; Xu, J;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号